Reinforcement Learning with Preferences
نویسندگان
چکیده
In this work, we propose a framework of learning with preferences, which combines some neurophysiological findings, prospect theory, and the classic reinforcement learning mechanism. Specifically, we extend the state representation of reinforcement learning with a multi-dimensional preference model controlled by an external state. This external state is designed to be independent from the reinforcement learning process so that it can be controlled by an external process simulating the knowledge and experience of an agent while preserving all major properties of reinforcement learning. Finally, numerical experiments show that our proposed method is capable to learn different preferences in a manner sensitive to the agent’s level of experience.
منابع مشابه
Preference elicitation and inverse reinforcement learning
We state the problem of inverse reinforcement learning in terms of preference elicitation, resulting in a principled (Bayesian) statistical formulation. This generalises previous work on Bayesian inverse reinforcement learning and allows us to obtain a posterior distribution on the agent’s preferences, policy and optionally, the obtained reward sequence, from observations. We examine the relati...
متن کاملPreference Elicitation and Inverse Reinforcement Learning
We state the problem of inverse reinforcement learning in terms of preference elicitation, resulting in a principled (Bayesian) statistical formulation. This generalises previous work on Bayesian inverse reinforcement learning and allows us to obtain a posterior distribution on the agent’s preferences, policy and optionally, the obtained reward sequence, from observations. We examine the relati...
متن کاملLearning User Preferences in Ubiquitous Systems: A User Study and a Reinforcement Learning Approach
Our study concerns a virtual assistant, proposing services to the user based on its current perceived activity and situation (ambient intelligence). Instead of asking the user to define his preferences, we acquire them automatically using a reinforcement learning approach. Experiments showed that our system succeeded the learning of user preferences. In order to validate the relevance and usabi...
متن کاملExploitation of User’s Preferences in Reinforcement Learning Decision Support Systems
A system called COLMAS (COordination Learning in Multi-Agent System) has been developed to investigate how the integration of realistic geosimulation and reinforcement learning might support a decision-maker in the context of cooperative patrolling. COLMAS is a model-driven automated decision support system combining geosimulation and reinforcement learning to compute near optimal solutions. Bu...
متن کاملLearning from Trajectory-Based Action Preferences
Conventional reinforcement learning algorithms depend on the availability of a numerical feedback signal. In many domains, this is not readily available and, in fact, constitutes an additional parameter of the problem setting. As a consequence, a fair amount of engineering is required in order to find a reasonable configuration of reward signal and algorithm parameters that facilitates an effic...
متن کامل